Improving EM Algorithm Estimates for Record Linkage Parameters
نویسنده
چکیده
The EM algorithm can be used to estimate conditional probabilities for matching field patterns for the Fellegi-Sunter model for record linkage. The algorithm is based on a latent class model for the record pairs where one of the classes is the set of true matches. If the number of true match pairs in the data set is too small, then the EM algorithm cannot detect the correct latent class. We consider methods for enriching the density of matches in the set of examined record pairs in order to obtain improved EM algorithm estimates for the record linkage conditional probability parameters.
منابع مشابه
Analysis of a Probabilistic Record Linkage Technique without Human Review
We previously developed a deterministic record linkage algorithm demonstrating sensitivities approaching 90% while maintaining 100% specificity. Substantially better performance has been reported using probabilistic linkage techniques; however, such methods often incorporate human review into the process. To avoid human review, we employed an estimator function using the Expectation Maximizatio...
متن کاملUsing the EM Algorithm for Weight Computation in the Felligi-Sunter Model of Record Linkage
Let A×B be the product space of two sets A and B which is divided into a (pairs representing the same entity) and nonmatches (pairs representing different entities). Linkage rules are those that divide A×B into links (designated matches), possible links (pairs for which we delay a decision), and nonlinks (designated nonmatches). Under fixed bounds on the error rates, Fellegi and Sunter (1969) p...
متن کاملComparison of Estimates Using Record Statistics from Lomax Model: Bayesian and Non Bayesian Approaches
This paper address the problem of Bayesian estimation of the parameters, reliability and hazard function in the context of record statistics values from the two-parameter Lomax distribution. The ML and the Bayes estimates based on records are derived for the two unknown parameters and the survival time parameters, reliability and hazard functions. The Bayes estimates are obtained based on conju...
متن کاملMethods for Record Linkage and Bayesian Networks
Although terminology differs, there is considerable overlap between record linkage methods based on the Fellegi-Sunter model (JASA 1969) and Bayesian networks used in machine learning (Mitchell 1997). Both are based on formal probabilistic models that can be shown to be equivalent in many situations (Winkler 2000). When no missing data are present in identifying fields and training data are ava...
متن کاملSupplemental Material: A Hidden Markov Approach for Ascertaining SNP Genotypes from Next Generation Sequencing Data in Presence of Allelic Imbalance by Exploiting Linkage Disequilibrium
In this section, we provide details of the EM algorithm for obtaining the maximum likelihood estimates (MLE) of θ where θ = (α1, β1, α2, β2, e,A) T , where A = (akk′)k,k′=1,··· ,M are parameters in the transition matrix. To this end, we introduce the following complete data corresponding the observed data X, Y = {Gil, δil,Xil : l = 1, · · · , L} for i = 1, · · · , n. The likelihood function for...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002